Locale

In computing, locale is a set of parameters that defines the user's language, country and any special variant preferences that the user wants to see in their user interface. Usually a locale identifier consists of at least a language identifier and a region identifier.

On Unix, Linux and other POSIX-type platforms, locale identifiers are defined similar to the BCP 47 definition of language tags, but the locale variant modifier is defined differently, and the character set is included as a part of the identifier. It is defined in this format: [language[_territory][.codeset][@modifier]]. (For example, Australian English using the UTF-8 encoding is en_AU.UTF-8.)

Contents

General locale settings

These settings usually include the following display (output) format settings:

The locale settings are about formatting output given a locale. So, the timezone information and daylight saving time are not usually part of the locale settings. Less usual, but worth mentioning, is the input format setting. This is mostly defined on a per application basis.

Furthermore, the general settings usually include the keyboard layout setting.

Programming/markup language support

In these environments,

and other (nowadays) Unicode-based environments, they are defined in a format similar to BCP 47. They are usually defined with just ISO 639 and ISO 3166-1 alpha-2 codes.

POSIX-type platforms

On Unix, Linux and other POSIX-type platforms, locale identifiers are defined similarly to the BCP 47 definition of language tags, but the locale variant modifier is defined differently, and the character set is included as a part of the identifier.

In the next example there is an output of command locale for Czech language (cs), Czech Republic (CZ) with explicit UTF-8 encoding:

$ locale
LANG=cs_CZ.UTF-8
LC_CTYPE="cs_CZ.UTF-8"
LC_NUMERIC="cs_CZ.UTF-8"
LC_TIME="cs_CZ.UTF-8"
LC_COLLATE="cs_CZ.UTF-8"
LC_MONETARY="cs_CZ.UTF-8"
LC_MESSAGES="cs_CZ.UTF-8"
LC_PAPER="cs_CZ.UTF-8"
LC_NAME="cs_CZ.UTF-8"
LC_ADDRESS="cs_CZ.UTF-8"
LC_TELEPHONE="cs_CZ.UTF-8"
LC_MEASUREMENT="cs_CZ.UTF-8"
LC_IDENTIFICATION="cs_CZ.UTF-8"
LC_ALL=

The full list of POSIX locale codes [1] may be found on the Internet Assigned Numbers Authority (IANA) website [2]

Details of the IANA registry for language tag extensions [3] and IANA protocols [4] are also to be found there.

Specifics for Microsoft platforms

Locale identifier (LCID) for unmanaged code on Microsoft Windows, a number such as 1033 for English (United States) or 1041 for Japanese (Japan). These numbers consist of a language code (lower 10 bits) and culture code (upper bits) and are therefore often written in hexadecimal notation, such as 0x0409 or 0x0411. The list of those codesets are described in character encoding. Microsoft is beginning to introduce managed code Application programming interfaces (APIs) for .NET that use this format. One of the first to be generally released is a function to mitigate issues with internationalized domain names,[1] but more are in Windows Vista Beta 1.

Beginning with Windows Vista, new functions that use BCP 47 locale names have been introduced to replace nearly all LCID-based APIs.

See also

References

External links